486 research outputs found

    Performance measurements of distributed simulation strategies

    Get PDF
    Journal ArticleA multiprocessor-based, distributed simulation testbed is described that facilitates controlled experimentation with distributed simulation algorithms. The performance of simulation strategies using deadlock avoidance and deadlock detection and recovery techniques are examined using various synthetic and actual workloads. The distributed simulators are compared with a uniprocessor-based event list implementation. Results of a series of experiments demonstrate that message population and the degree to which processes can look ahead in simulated time play critical roles in the performance of distributed simulators using these algorithms. An "avalanche" phenomenon was observed in the deadlock detection and recovery simulator, and was found to be a necessary condition for achieving good performance. The central server queueing model was also examined. The poor behavior of this test case that has been observed by others is reproduced in the testbed, and explained in terms of message population and lookahead. Based on these observations, a modification to the server process program is suggested that improves performance by as much as an order of magnitude when firstcome- first-serve (FCFS) servers are used. These results demonstrate that conservative distributed simulation algorithms using deadlock avoidance or detection and recovery techniques can provide significant speedups over sequential event list implementations for some workloads, even in the presence of only a moderate amount of parallelism and many feedback loops. However, a moderate to high degree of parallelism is not sufficient to guarantee good performance

    Time warp on a shared memory multiprocessor

    Get PDF
    Journal ArticleA variation of the Time Warp parallel discrete event simulation mechanism is presented that is optimized for execution on a shared memory multiprocessor. In particular, the direct cancellation mechanism is proposed that eliminates the need for anti-messages and provides an efficient mechanism for cancelling erroneous computations. The mechanism thereby eliminates many of the overheads associated with conventional, message-based implementations of Time Warp. More importantly, this mechanism effects rapid repairs of the parallel computation when an error is discovered. Initial performance measurements of an implementation of the mechanism executing on a BBN Butterfly? multiprocessor are presented. These measurements indicate that the mechanism achieves good performance, particularly for many workloads where conservative clock synchronization algorithms perform poorly. Speedups as high as 56.8 using 64 processors were obtained. However, our studies also indicate that state saving overheads represent a significant stumbling block for many parallel simulations using Time Warp

    The virtual time machine

    Get PDF
    Journal ArticleExisting multiprocessors and multicomputers require the programmer or compiler to perform data dependence analysis at compile time. We propose a parallel computer that performs this task at runtime. In particular, the Virtual Time Machine (VTM) detects violations of data dependence constraints as they occur, and automatically recovers from them. A sophisticated memory system that is addressed using both a spatial and a temporal coordinate is used to efficiently implement this mechanism. Initially targeted for discrete event simulation applications, many of the ideas used in the machine architecture have direct application in the more general realm of parallel computation. The long term goal of this work is to develop a general purpose parallel computer that will support a wide range of parallel programming paradigms. This paper outlines the motivations behind the V TM architecture, the underlying computation model, a proposed implementation, and initial performance results. A recurring theme that pervades the entire paper is our contention that existing shared memory and message-base machines do not pay adequate attention to the dimension of time. We argue that this architectural deficiency is the underlying reason behind many difficult problems in parallel computation today

    HOP: a process model for synchronous hardware semantics, and experiments in process composition

    Get PDF
    technical reportWe present a language "Hardware viewed as Objects and Processes" (HOP) for specifying the structure, behavior, and timing of hardware systems. HOP embodies a simple process model for lock-step synchronous processes. An absproc specification written in HOP describes the externally observable behavior of a process. A collection of absprocs may be composed to form a larger process, using the operators parallel composition, renaming, and hiding. In this paper we present the communication primitives of HOP, illustrate HOP through several examples, and then present its operational semantics. Then we present the role played by HOP in in three VLSI design activities: (i) inferring concise behavioral descriptions of systems from their structural descriptions; (ii) static detection of control timing errors during behavioral inferrence; (Hi) productive and runtime efficient functional simulation using the inferred behavior

    Design and verification of the rollback chip using HOP: a case study of formal methods applied to hardware design

    Get PDF
    technical reportThe use of formal methods in hardware design improves the quality of designs in many ways: it promotes better understanding of the design; it permits systematic design refinement through the discovery of invariants; and it allows design verification (informal or formal). In this paper we illustrate the use of formal methods in the design of a custom hardware system called the 'Rollback Chip' (RBC), conducted using a simple hardware design specification language called 'HOP'. An informal description of the requirements of the RBC is first given, followed by a behavioral description of RBC stating its desired behavior. The behavioral description is refined into progressively more efficient designs, terminating in a structural description. Key refinement steps are based on system invariants that are discovered during the design, and proved correct during design verification. The first step in design verification is to apply a program called PARCOMP to derive a behavioral description from the structural description of the RBC. The derived behavior is then compared against the desired behavior using equational verification techniques. This work demonstrates that formal methods can be fruitfully applied to a non-trivial hardware design. It also illustrates the particular advantages of our approach based on HOP and PARCOMP. Last, but not the least, it formally verifies the RBC mechanism itself

    HOP: a process model for synchronous hardware systems

    Get PDF
    technical reportModules in HOP are black-boxes that are understood and used only in terms of their interface. The interface consists of d a t a ports, events, and a protocol specification that uses events and asserts/queries values to / from ports. Events are realized as different combinations of control wires or as predicates defined over data conduits. Module await either command events or status events. Data conduits are realized as bus structures that deliver the same data items at the receiving end as items sent at t h e sending end (i.e. the busses do not have any wire-permutations, tappings, etc.). HOP is useful for writing both requirements (a priori) specifications and design (a posteriori) specifications. The manner in which requirements are expressed has usually no bearing on the actual implementation chosen later. Design specifications capture known facts about a system that has been built or has been designed in detail. In a HOP based design methodology, design proceeds hierarchically, and on many occasions (but not always) top-down. For most large systems, t h e requirements specification consists of the specification of a collection of modules and not one module; for these systems, the single module view is only derived a posteriori

    Optimal performance of distributed simulation programs

    Get PDF
    Journal ArticleThis paper describes a technique to analyze the potential speedup of distributed simulation programs. A distributed simulation strategy is proposed which minimizes execution time through the use of an oracle to control the simulation. Because the strategy relies on an oracle, it cannot be used for practical simulations. However the strategy facilitates performance evaluations of distributed simulation strategies by providing a useful point of comparison and can be used to determine the suitability of specific applications for implementation on a parallel computer. Based on the proposed strategy, a tool has been developed to determine the maximum performance which can be achieved from a distributed simulation program. In this paper we describe the technique and its use in evaluating the parallelism available in distributed simulators of parallel computer systems

    Simon II kernel reference manual

    Get PDF
    Journal ArticleThe principal objective of Simon II is to provide a flexible and adaptable framework for constructing simulators for a wide variety of parallel systems. A simulator consists of a set of software building blocks. Each building block, i.e. object, simulates a specific component of the parallel system. Objects may be defined in terms of other objects, supporting a hierarchical view of the system

    On synthesizing systolic arrays from recurrence equations with linear dependencies

    Get PDF
    Journal ArticleWe present a technique for synthesizing systolic architectures from Recurrence Equations. A class of such equations (Recurrence Equations with Linear Dependencies) is defined and and the problem of mapping such equations onto a two dimensional architecture is studied. We show that such a mapping is provided by means of a linear allocation and timing function. An important result is that under such a mapping the dependencies remain linear. After obtaining a two-dimensional architecture by applying such a mapping, a systolic array can be derived if t h e communication can be spatially and temporally localized. We show that a simple test consisting of finding the zeroes of a matrix is sufficient to determine whether this localization can be achieved by pipelining and give a construction that generates the array when such a pipelining is possible. The technique is illustrated by automatically deriving a well known systolic array for factoring a band matrix into lower and upper triangular factors

    Systolic array synthesis by static analysis of program dependencies

    Get PDF
    Journal ArticleWe present a technique for mapping recurrence equations to systolic arrays. While this problem has been studied in fairly great detail, the recurrence equations that are analysed here are a generalization of those studied previously. In a n earlier paper (14] we have showed how systolic arrays can be synthesized from such generalized recurrence equations by a combination of affine transformations and explicit pipelining. This paper extends the results in two directions. Firstly, a multistage pipelining technique is proposed, which permits the synthesis of systolic arrays with irregular data flow. Secondly we develop analysis techniques for the synthesis of systolic arrays whose computation is governed by control signals in a systematic manner which is amenable to mechanization. The full paper also discusses how these techniques can be applied to the mapping problem for more general architectures
    • …
    corecore